-
Notifications
You must be signed in to change notification settings - Fork 424
refactor: convert to mysql values directly from arrow #7096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the MySQL writer to convert Arrow arrays directly to MySQL values instead of using intermediate GreptimeDB Value objects, resulting in significant performance improvements for iterating record batches.
Key changes:
- Replaced Value-based iteration with direct Arrow array processing in the MySQL writer
- Added benchmark tests to demonstrate performance improvements
- Updated dependencies to support new direct conversion approach
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/servers/src/mysql/writer.rs | Complete refactor of record batch processing to use direct Arrow array access instead of Value objects |
| src/servers/Cargo.toml | Added common-decimal dependency for direct decimal handling |
| src/common/recordbatch/benches/iter_record_batch_rows.rs | New benchmark file comparing different record batch iteration approaches |
| src/common/recordbatch/Cargo.toml | Added criterion benchmark dependency and configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
f74b7a3 to
e47fed7
Compare
|
How will we do with the new Json type, since there is no concrete json type in datafusion? |
Signed-off-by: luofucong <[email protected]>
e47fed7 to
e023788
Compare
I think there will be a "json display" function that can parse the new json datatype to plain string, like this: DataType::Struct(_) => {
let struct_array = array.as_struct();
if column_schema.is_json() {
let string = json_display(struct_array.value(i));
writer.write(string);
} else {
// same as now
..
}
} |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
so that there are no intermediate
Values to occupy memory, and the performance for iterating record batch is significantly better:from
iter_record_batch_rows.rs:PR Checklist
Please convert it to a draft if some of the following conditions are not met.